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8x Despreader/Correlator Enhanced ft/iultiplier Opcode 

Conventions: 

° Naming: 1-bit values , both real and imaginary one-bit values are referred to as codes, and sometimes 

as PN codes. The real part and imaginary parts of a con^lex value is known as {real-part, imaginary 

part}, { real, img }, {1 J} , and {I,Q}; with a preference (real:img) and (I,Q) 
° Code mapping: we will adopt the convention where 0->l and l-> This allows us to treat the one bit 

code values as signs of 1 bit integers. This is con^liant with CDMA2000 but contrary to some 

common usage. An XOR can be used to convert from this mapping to the opposite. 

° Exan^jle the code 01 implies 0 for the real part and 1 for the imaginary part 
° Real-img ordering: we will adopt the convention that the img part of a complex nimiber is allocated to 

the LSB or little endian position. The motivation for this is to allow real on the left and img on the 

right when viewing 32-bit values as hex or binary displays 

° Exarr^le the code 01 encodes 1-j; assuming img or 'j' is in Isb 
° earliest -latest ordering: We will adopt the convention that earliest samples in time are assigned LSB 

slots. This is in line with naming samples in ascending order when written in time sequence. 

Example a time sequence of values on a port appears as - D0:D1:D2:D3 
° 1-bit * 8-bit complex mtiltiplier format: We will adopt the convention that we implement a 

mathematical complex multiply assxmiing the input sanple are preformatted as real+img, real-img 

pairs. Other conventions such real multiplies, and complex conjugates of the imaginary part of the 

code will require additional preformatting of the input data, but may be inplemented as opcode 

options. 

° Example: {0,0}*{i,q}- {(I-q},(I+q) }; 



Additions: 

1-bit * 8-bit conqjlex vector dot product definition: SUM(code[]*data[]) 

complex_l code [8] ; 
complexes data [8] ; 
coTnplex_8 dotproduct; 
for (n=0;n<nelm;n++) 

{ 

dotproduct += code [n] * data [n] ; 

} 



2- CODE code encoding 



1-bit complex integer format - CODE.q, CODE.q 


CODE code value 


Numerical meaning 


0 


+ 1.0 


1 


-1.0 



3 CODE format 



1-bit complex integer format - CODEfhO] 


Bit 


Nimierical meaning 


coDErn 


1,1, or real part 


CODEfOl 


Q j or imaginary part 



4 Complex 16-bit data input format 
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16-bit complex integer format - IN[3 1 :01 


Field . 


Numerical meaning 


IN[31:16] 


I or real part 


IN[15:001 


Q or imaginary part 


5- Conqjlex 8-bit data input formatO 16-bit aligned 


8-bit complex integer format - IN 


[31:01 


Field 


Numerical meaning 


IN[23:161 


I or real part 


rN[07:00] 


q or imaginary part 


6- Dual Conq}lex 8-bit data input format 1 16-bit aligned 


Dual 8-bit complex integer format 


- IN[31:01 


Field 


Numerical meaning 


INr3 1:241 


11 or real part sample 1 


INri5:081 


ql or imaginary part, sample 1 


INr23:161 


10 or real part, sample 0 


IN[07:001 


qO or imaginary part sample 0 
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CODE formats 

Each CODE bit pair is used to drive a mux/negate unit connect to two input bytes 









1 6XCorrelate 


It CI ^/^^ 11^ 

in 32 - 
bit word. 


u > j_ u : u : qU 


11 : XO :CJl : qO 


0 : iO : 0 : qO 


CODE bit 






D3.ta. input 


0 , 1 




L-^-s .XdJ , J.U L" / :UUJ 


XU Llb:0oj ,X0 L07:00J 


2 , 3 


Tir2'^'lf%T Tirn^-nnl 


±u [j-L:^4j , lu LJ-OiUoJ 


JJO Llo:OoJ ,D0 L07:00J 


4 - 5 


T2r2i-ifi1 T2 rn7 • nm 

-L^ L^J .XOJ , L^ / .UUJ 


XX L^-j .XDj , XX Lw / :uuj 


JJXLlb:OoJ ,D1 107:00J 


6,7 


13 [23 : 16] , 13 [07 : 00] 


11 [31:24] ,I1[15:08] 


D2ri5*08l D2r07-00l 


8,9 




12 [23 : 16] , 12 [07 :00] 


D3 [15 :08] ,D3 [07:00] 


10, 11 




12 [31 :24] , 12 [15 : 08] 


D4 [15:08] , D4 [07:00] 


12, 13 




13 [23:16] ,13 [07:00] 


D5 [15:08] , D5 [07:00] 


14, 15 




13 [31 :24] , 13 [15 : 08] 


D6 [15:08] ,D6 [07:00] 


16, 17 


10 [23 :16] , 10 [07 : 00] 


10 [23 :16] , 10 [07 : 00] 


D7 [15 :08] , D7 [07:00] 


18, 19 


Il[23:16] ,I1[07:00] 


10 [31 :24] ,10 [15 ; 08] 


D8 [15:08] ,D8 [07:00] 


20,21 


12 [23 :16] , 12 [07 :00] 


11 [23:16] ,X1[07:00] 


D9 [15:08] , D9 [07:00] 


22,23 


13 [23:16] ,13 [07:00] 


Il[31:24] ,I1[15:08] 


DIO [15:08] ,D10 [07:00] 


24,25 




12 [23 :16] , 12 [07:00] 


Dll [15:08] ,D11 [07:00] 


26, 27 




12 [31 :24] , 12 [15 : 08] 


D12 [15:08] , D12 [07:00] 


28,29 




13 [23:16] ,13 [07:00] 


D13 [15:08] ,D13 [07:00] 


30,31 




13 [31:24] , 13 [15:08] 


D14 [15:08] , D14 [07:00] 
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Complex Vector Multiply units TO, T1 

Input data busses not shown 



CO 
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CODE multiply routing 

A mux is employed to route CODE bits to the correct CODE multiply unit (the mux-negate unit) with the 
truth table below. The following table summarizes the routing to 32 mux negate units as a function of 
opcode. The 16XADD and 16XASUB opcodes could also be inQ)lemented in the mux/neg block instead of 
the mux 



CODE 

(real, img) 


mapping 


result . real 


result . img 


00 


+ 1, +1 


•fr 


+ i 


01 


+ 1, -1 


+i 


-r 


10 


-1,+1 


-i 


+r 


11 


-1, 1 


-r 


-i 



OPCODE 


Despreader 


4XDESP 


8XDESP 


16XCorrelate 


mux 

negate 

unit 




C src 
bit 


C src 
bit 


C src bit 


xOO 


TO . img 


cCO,l] 


c [0,1] 


c[0, 1] 


xOl 


TO . img 


c[2,3] 


c [4,5] 


c[2,3] 


x02 


TO . img 


c[4,5] 


c[8,9] 


c[4,5] 


x03 


TO . img 


c[6,7] 


c [12, 13] 


c[6,7] 


x04 


TO . img 




c [2,3] 


c[8, 9] 


x05 


TO . img 




c [6,7] 


c [10, 11] 


x06 


TO . img 


_ 


c [10, 11] 


c[12,13] 


x07 


TO . img 


- 


C [14, 15] 


. c[14,15] 


y0 8 


TO . real 


c[0,l] 


c [0,1] 


c [0,1] 


y0 9 


TO . real 


c[2,3] 


c [4,5] 


c[2,3] 


ylO 


TO . real 


c[4,5] 


C[8,9] 


C[4,5] 


yll 


TO . real 


c[6,7] 


c [12, 13] 


c[6,7] 


yl2 


TO .real 




c[2,3] 


C[8, 9] 


yl3 


TO . real 




c [6, 7] 


C [10, 11] 


yl4 


TO . real 




c [10, 11] 


c [12 , 13] 


yl5 


TO . real 




c [14, 15] 


C [14, 15] 


Xl6 


Tl . img 


C [16, 17] 


C [16, 17] 


C [16, 17] 


xl7 


Tl . img 


c [18, 19] 


C [20,21] 


c[18,19] 


Xl8 


Tl . img 


C [20,21] 


C [24,25] 


c [20,21] 


xl9 


Tl . img 


c[22,23] 


C [28, 29] 


c[22,23] 


x2 0 


Tl . img 




C [18, 19] 


c[24,25] 


x21 


Tl . img 




c [22, 23] 


c[26,27] 


x2 2 


Tl . img 




c [26, 27] 


c [28,29] 


x2 3 


Tl . img 




c [30,31] 


c [30,31] 


y2 4 


Tl . real 


c [16, 17] 


c [16, 17] 


c [16, 17] 


y2 5 


Tl . real 


c[18,19] 


c [20,21] 


c [18, 19] 


y2 6 


Tl . real 


C[20,21] 


C [24, 25] 


c [20,21] 


y2 7 


Tl . real 


C [22 , 23] 


C [28,29] 


c [22,23] 


y2 8 


Tl . real 




c [18, 19] 


c [24,25] 


y2 9 


Tl . real 




C [22,23] 


c [26,27] 


y3 0 


Tl . real 




c [26, 27] 


c [28,29] 


y31 


Tl . real 




C [30,31] 


c [30,31] 
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Multiply Unit 

CODE Multiply format 

The code multiply unit multiplies 2 complex values, a 1-bit value kiaow 
a the code and an 8 -bit complex pair know as data. 

This leads to the table : 

CODE (real, img) result. real= result . img= 

00 -> 1, 1 data.r - data.i; data.r + data.g; 

01 -> .1,-1 data.r + data.i; - data.r + data.q; 

10 -> -1, 1 - data.r - data.i; data.r - data.q; 

11 ->-l,-l - data.r + data.i; - data.r - data.q; 

For efficient implementation this is to be implemented in the 
despreader by: 

1) requiring the data to be preformatted as: data.r =r-i, data.i=r+i; 

2) using a mux followed by a negate to implement the multiply as 
follows : 

CODE (real , img) result . real result . img 

00 -> 1, 1 r - i r + i 

01 -> 1, -1 r + i - (r - i) 

10 -> -1, 1 - (r + i) r - i 

11 -> -1,-1 - (r - i). - (r + i) 

If a 45 degree rotation and scaling is allowed as is ok when pilot and 
data are decoded together, the pre -format ting can be dropped to yield 
the following function table : 

CODE (real , img) result . real result . img 

00 -> 1, 1 r i 

01 -> 1,-1 i -(r) 

10 -> -1, 1 - (i) r 

11 -> -1, -1 - (r) - (i) 

The real part is close to the UMTS encoding: 

bits tJMTS 

00 + i 

01 + r 

10 - r 

11 - i 
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Mux-negate Options 

Besides complex multiply other modes of the mux negate units are useful 

These modes are 

conqjlex - the normal multiply 

con^lex-conjugate - complement the imaginary part of the code before multiply 
zero - force code to 00 to effect an adder 

real - use the real part of the code to negate the real part of the data and the img part of the code to negate 
the img part of the data. 



The following truth table would apply if we decided to inclement these 
additional modes (assmne data at mux is called real, img) 

assumming the 4 modes are adopted, we can use input mux bits to for the source 
of the bits. 



mode 


code 


real result 


img result 


complex 


00 


real 


img 


complex 


01 


img 


-real 


complex 


10 


-img 


real 


complex 


i,d:/- 


-real 


-img 


complex-cnj 


Oi?L:\^ 


real 


img 


complex- en j 


obi 


i 


img 


-real 


complex-cnj 




/) 


-img 


real 


complex-cnj 


^0 




-real 


-img 


real-r* 


Ox 


real 




real-r 


ix 


-real 




real-i** 


xO 




img 


real-i 


xl 




-img 


zero 


XX 


real 


img 



* real mode selects the real input and uses code[l] to control negation for the 
real output. 

** real mode select the img input and uses code[0] to control negation for the 
img output. 
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Despreading 

Bespreading is used in rake receivers. The common case used in both IXRTT and UMTS is to despread a 
symbol against 2 separate codes to recover pilot and data channels. The circuit below is used as a test case 
to evaluate despreading performance for a variety of architectures. It despreads 4 16-bit or 8 8-bit conqjlex 
input san^les know as "chips" to form two conqjlex results corresponding to the pilot and data outputs of a 
rake despreader. Each input is stored as 8-bit complex data which may be unpacked to 1 6-bit complex data. 
The input data is assumed to be stored in separate LSM memories, and is addressed in such a fashion as to 
read out a contiguoxis neighborhood of 4 san^les separated by a 1 chip time delay. 



pilot_code 




pilot_out 



data out 




= 1-bit con^lex multiply 



Vermont architecture enhancements which increase the number of chips per tile are: 

■ the address generator 

■ support for 8-bit impacking 

■ . support for addsubl6 and subaddl6 iostructions. 

Vermont plus architecture enhancements which increase the number of chips per tile are: 

■ Adder tree in 2X multipliers 

■ despreader opcode ia enhanced Vermont 

The data storage format for input data is important for efficiency. In some cases increased performance can 
be obtained if data is stored in memory as 1 6-bit data or in a redundant form of 8-bit data. The effect of 
various hw options has been summarized in table 1 and the implementations simimarized in table 2. 

Table 1 - Number of chips per tile for despreading data in memory against 2 CODE codes 
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function 


data type 


CS2112 


Vermont 


Vermont 
2Xmul 


Vermont 
SBBA 


Vermont 
EX 


despread 


8-bit* 


1 


1.4 


2.3 




4 


despread 


16-bit 


1 


1.75 


3.5 




4 


despread 


2X8-bit** 


1.4 






3 .5 


8 


corr 


8-bit 


52 


64 


64 


64 


192 



* stores 8-bit complex data in memory as 32-bit word organized as: 
+i:-q:+q:+i; 

.**stores 8-bit complex data in memory as 32-bit word organized as; 
iO:qO:il:ql, 
il : ql : i2 : q2 ; 



Table 2 - The table below detials the DPU usage for despreadiiig modes used above: 



format 


chip 


dpuO 


dpul 


dpu2 


dpu3 


mult 


nDPU 


nDPU 

ucL 


nDPU 


chip/ 


8-bit 


2112 


mem 


unpack 
negate 


swap 


tree 




4 


3 


7 


1 


8-bit 


V 


mem , 
unpack 


negate 
swap 


tree 






3 


2 


5 


1 . 4 


8-bit 


V2x 


mem 
unpack 


negate 
swap 






tree 


2 


1 


3 


2.3 


8-bit 


Vex 


mem 
unpack 








codemu 
It 

tree 


1 


0 


1 


4 


16-bit 


2112 


mem 


negate 


swap 


tree 


tree 


4 


3 


7 


1 . 


16-bit 


V 


mem 

negate 

swap 


tree 








2 


2 


4 . 


1.75 


16-bit 


V2x 


mem 

negate 

swap 








tree 


1 


1 


2 


3.5 


16-bit 


Vex 


mem 








codemu 
It 

tree 


1 


0 


1 


4 


2X8bit 


2112 


mem 


negate 
swap ... 


tree 






3 


2 


5- 


1.4 


2X8bit 


V 

sbba 


mem 
sbba 








tree 


2 


2 


4 


3.5 


2X8bit 


Vex 


mem 








codemu 
It 

tree 


1 


0 


1 


8 



IXRTT / UMTS Rake Receiver channel count 

Based on the despreading performance and a 150 MHz clock for Vermont, the estimated IXRTT and 
UMTS rake receiver channel count is summarized below : 





CS2112 
DPUs 


CS2112 
channels 


Vermont 
channels 


Vermont 

2Xmul 

channels 


Vermont 
EX 

channels 


IXRTT channels 


50 


50 


75 


100 


150-200 


UMTS channel 


32 


16 


24 


32 


32-48 
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The diagram below implements a 4 chip despreader to two different CODE codes 
din 



10 



II 



12 



13 
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I+Q, 
I-Q 



H 



H 



H 



16-bit inq)lementation of despreading opcode 



I,Q * CODE 



CODE 


0[31:16] = 


O[15:0] = 


00 


-H=- (I-Q) 


L=- (I+Q) 


01 


-L=- (I+Q) 


H=(I-Q) 


10 


L= (I+Q) 


-H=- (I-Q) 


11 


H= (I-Q) 


L=(I+Q) 



CODE(real,img) result.real result. img 



GO -> -1,-1 -(r-i) -(r-fi) 

01 ->-l, 1 -(r + i) r-i 

10 -> 1,-1 r + i -(r-i) 

11 -> 1, 1 r-i r + i 
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8-Bit. 



10 



II 




OO 



12 



13 




OO 



10 



12 




Ol 



Correlation circuits. Circuit 3 is implemented as 
the correlation opcode 
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A despreader tree can be constructed to inclement dual 4-chip despreader for 16-bit data and a dual 8-chip 
despread for 8-bit data. 4 despread trees are needed, one for each 16-bit output field. 



Function 


Output 


Function 


Despreader TreesO 


O0[l 5:001 


real - i 


Despreader Trees 1 


O0f3 1:161 


imaginary - q , 


Despreader Trees2 


Olfl5:001 


real - i 


Despreader Trees3 


Oir31:161 


imaginary - q 



direct 
input 



input mux 0 



input mux 1 



N 



input mux 2 



N 



o 



input mux 3 



iO [07 : 00] 



iO [23 :16] 



il [07 :00] 



il [23 :16] 



i2 [07 :00] 



i2 [23 :16] 



i3 [07 :0Q] 



i3 [23 :16] 



iO [15 :08] 



24] 



il [15 :08] 



il [31:24] 



i2 [15 :08] 



i2 [31:24] 



i3 [15 :08] 



i3 [31:24] 



Fig 2 - Despread Tree 0 



OO [15:0] 
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Despreader Trees with input delay 

Adding a separate input mux and delay chain enables a dual correlation function to be implemented with 
only one external DPU. For this mode Output OO is the sum of CO+Cl ; 



Function 


Function 


Output - 
despread 


Output - 
correlation 


Chain input 


Chain output 


Despreader TreesO 


real - i 


O0[15:00] 


COfl 5:001 


I0C23:161,I0[7:01 


.chain[23:16],[7:01 


Despreader Trees 1 


imaginary - q 


O0[31:16] 


C0[31:16] 






Despreader Trees2 


real - i 


Ol[15:00] 


Cl[15:00]. 


chain[23:I6],[7:01 


Ol[15:00] 


Despreader Trees3 


imaginary - q 


01[31:16] 


Cl[31:16] 
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direct 
input 



input mux 0 



input mux 1 



IN 



input mux 2 



N 



1/ 

input mux 3 



chain source 



iO [07 :00] 



iO [23 :16] 



il [07 :00] 



il [23 :16] 



i2 [07 :00] 



i2 [23 :16] 



i3 [07 :003 



i3 [23 :16] 



iO [15:08] 



1:24] 



il [15 :08] 



il [31 :24] 



i2 [15 :08] 



i2 [31 :24] 



i3 [15 :08] 



i3 [31 :24] 



Fig 2 - Despread Tree 0 




code [0,1,2, 



code [ 





<2> 




Output 



10 



11 




4— 

10 



11 



chain output 



12 
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Physical Layout 





elements per output 


total 


4:1 Muxes 8-bits 


8 


32 


pipe regs 8-bit 


16 


64 


XOR 8-bit 


8 


32 


adders 8-bit 


4 


16 


adders 9-bit 


2 


8 


adders 10-bit 


3 


12 


Total blocks 


352 


1408 


Total blocks in 4:1 mux and pipe 


192 


768 



Loads per input = 4 

1408 * 100umsq = .1408 mmsq 



Qir31:161 



Q0[31:161 



iO 



Qir31:161 



O0[3 1:161 



il 
i2 



i3 



mOO 


iOl 


i02 


iOl 


mOl 








aO 


al 


a2 


a3 


a4 


a6 


a5 


a7 


mlO 








mil 








aO 


al 


a2 


a3 


a4 


a6 


a5 


a7 


m20 








m21 








aO 


al 


a2 


a3 


a4 


a6 


a5 


a7 


m30 








m31 








aO 


al 


a2 


a3 


a4 


a6 


a5 


a7 


16 rows at 10 um each?? 
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16-Jit output of each 'real' despreader tree is packed with the corresponding 'imaginary' despreader 
pactdrthTree't' " ^^^'^^'^ with'outpu'of Tre'el 2d tS 2'st 

SS^n) mode'*''' ^ performed inside the 2x multipher in 4ADD16(packed 16-bit 

A add-one signal decoded inside the despreader is used to determine the other operand of the final add 
The operand could either be zero or 2 packed 1 6-bit '000 1 ■ u oi me nnax aaa. 
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Iinux2 




32-bit chain output 



OutmujcO 



Outmuxl 



32-bit chain output is added with all zero in the 2x mult before being sent to output mux 1 
2 32-bit packed outputs CO and CI are added together before being sent to output mux 0. 



A 5-bit opcode is used in the enhanced multipher (both 2xmult and desp/corr) for decoding 9 modes 
2xmult and 12 mode desp/corr as shown in the following table. 



m 



Mode 


Bitr41 


Bitr3i 


Bitr2i 


Bitrn 


BitfOl 


4xdesp8 complex 


1 


1 


1 


0 


0 
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4xdesp8 comp-conj up 



4xdesp8 zero 



4xdesp8 real 



8xdesp8 complex 
8xdesp8 comp-con ju^ 



8xdesp8 zero 
8xdesp8 real 
Gorr complex 
CoiT comp-conjug 

Corr zero 

Corr real 
2MULT 



4ADD32 
4ADD16 
4]VnjLT 
4MULTSUM 

4MULT2SUM 
4FIR " 



CMULT 
CMULT16 
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There is an output register in each of the desp/corr tree. 

The pn code wiU be registered after the 2-to-l input scrambling mux 
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